Classification of keyphrases from scientific publications using WordNet and word embeddings
نویسندگان
چکیده
The ScienceIE task at SemEval-2017 introduced an epistemological classification of keyphrases in scientific publications, suggesting that research activities revolve around the key concepts of process (methods and systems), material (data and physical resources) and task. In this paper we present a method for the classification of keyphrases according to the ScienceIE classification, using WordNet and word embeddings derived features. The method outperforms the best system at SemEval-2017, although our experiments highlighted some issues with the collection. RÉSUMÉ. Dans le contexte du challenge ScienceIE à SemEval-2017, ses organisateurs ont introduit une classification des phrases clés dans les publications scientifiques. Selon leur hypothèse, les activités de recherche tournent autour des concepts clés de “process" (methodes, systèmes), “material" (ressources matériellles, données, produits) et “task" (problèmes, activités à poursuivre). Dans cet article, nous présentons une méthode pour la classification des phrases clés selon la classification donné par ScienceIE, en utilisant des caractéristiques dérivées à partir de WordNet et de “word embeddings". La méthode proposée dépasse le meilleur système au SemEval-2017; toutefois, nos expériences ont mis en évidence certains problèmes d’annotation avec la collection.
منابع مشابه
MayoNLP at SemEval 2017 Task 10: Word Embedding Distance Pattern for Keyphrase Classification in Scientific Publications
In this paper, we present MayoNLP’s results from the participation in the ScienceIE share task at SemEval 2017. We focused on the keyphrase classification task (Subtask B). We explored semantic similarities and patterns of keyphrases in scientific publications using pre-trained word embedding models. Word Embedding Distance Pattern, which uses the head noun word embedding to generate distance p...
متن کاملWING-NUS at SemEval-2017 Task 10: Keyphrase Identification and Classification as Joint Sequence Labeling
We describe an end-to-end pipeline processing approach for SemEval 2017’s Task 10 to extract keyphrases and their relations from scientific publications. We jointly identify and classify keyphrases by modeling the subtasks as sequential labeling. Our system utilizes standard, surface-level features along with the adjacent word features, and performs conditional decoding on whole text to extract...
متن کاملPKU_ICL at SemEval-2017 Task 10: Keyphrase Extraction with Model Ensemble and External Knowledge
This paper presents a system that participated in SemEval 2017 Task 10 (subtask A and subtask B): Extracting Keyphrases and Relations from Scientific Publications (Augenstein et al., 2017). Our proposed approach utilizes external knowledge to enrich feature representation of candidate keyphrase, includingWikipedia, IEEE taxonomy and pre-trained word embeddings etc. Ensemble of unsupervised mode...
متن کاملSearching Documents with Semantically Related Keyphrases
In this paper, we present a tool, called SemKPSearch, for searching documents by a query keyphrase and keyphrases that are semantically related with that query keyphrase. By relating keyphrases semantically, we aim to provide users an extended search and browsing capability over a document collection and to increase the number of related results returned for a keyphrase query. Keyphrases provid...
متن کاملInjecting Word Embeddings with Another Language's Resource : An Application of Bilingual Embeddings
Word embeddings learned from text corpus can be improved by injecting knowledge from external resources, while at the same time also specializing them for similarity or relatedness. These knowledge resources (like WordNet, Paraphrase Database) may not exist for all languages. In this work we introduce a method to inject word embeddings of a language with knowledge resource of another language b...
متن کامل